Oozie Shell Action Extension

Shell Action
    Shell Action Logging
    Shell Action Limitations
Appendix, Shell XML-Schema
    AE.A Appendix A, Shell XML-Schema
        Shell Action Schema Version 0.2
        Shell Action Schema Version 0.1

首先配置job-tracker, name-node , Shell exec 元素,必须配置参数和配置.

两个必要条件:1.输出格式必须是一个有效Java属性文件.2.输出大小在2KB之内

语法:

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.3">
    ...
    <action name="[NODE-NAME]">
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>[JOB-TRACKER]</job-tracker>
            <name-node>[NAME-NODE]</name-node>
            <prepare>
               <delete path="[PATH]"/>
               ...
               <mkdir path="[PATH]"/>
               ...
            </prepare>
            <job-xml>[SHELL SETTINGS FILE]</job-xml>
            <configuration>
                <property>
                    <name>[PROPERTY-NAME]</name>
                    <value>[PROPERTY-VALUE]</value>
                </property>
                ...
            </configuration>
            <exec>[SHELL-COMMAND]</exec>
            <argument>[ARG-VALUE]</argument>
                ...
            <argument>[ARG-VALUE]</argument>
            <env-var>[VAR1=VALUE1]</env-var>
               ...
            <env-var>[VARN=VALUEN]</env-var>
            <file>[FILE-PATH]</file>
            ...
            <archive>[FILE-PATH]</archive>
            ...
            <capture-output/>
        </shell>
        <ok to="[NODE-NAME]"/>
        <error to="[NODE-NAME]"/>
    </action>
    ...
</workflow-app>

prepare:启动作业前删除或创建的路径列表,指定的路径必须以 hdfs://HOST:PORT开头 .

job-xml:包含Shell作业的配置文件,在schema 0.2,可以允许多个job-xml元素来指定多个job.xml文件.

configuration:包含的配置属性将被提交到Shell作业中.

exec包含Shell命令执行路径,Shell命令将使用一个或多个argument元素.

argument包含将要提交的参数.

env-var包含环境变量并提交到Shell命令中.env-var是一个环境变量键值对. If the pair contains the variable such as $PATH, it should follow the Unix convention such as PATH=$PATH:mypath. 不要使用 ${PATH} ,这样将被Oozie El evaluator替换.

shell:节点创建一个Hadoop配置,这个配置作为一个有效本地文件在它运行的目录下. The exact file path is exposed to the spawned shell using the environment variable called OOZIE_ACTION_CONF_XML .The Shell application can access the environemnt variable to read the action configuration XML file path.

capture-output:捕捉shell命令执行的STDOUT为输出. The Shell command output必须在Java 配置文件格式并且不能被执行超过2KB.在WF定义内,shell活动节点的输出可以通过String action:output(String node, String key)函数来访问(Refer to section '4.2.6 Action EL Functions').

Example:

使用任何shell/perl 脚本或CPP executable

<workflow-app xmlns='uri:oozie:workflow:0.3' name='shell-wf'>
    <start to='shell1' />
    <action name='shell1'>
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                  <name>mapred.job.queue.name</name>
                  <value>${queueName}</value>
                </property>
            </configuration>
            <exec>${EXEC}</exec>
            <argument>A</argument>
            <argument>B</argument>
            <file>${EXEC}#${EXEC}</file> <!--Copy the executable to compute node's current working directory -->
        </shell>
        <ok to="end" />
        <error to="fail" />
    </action>
    <kill name="fail">
        <message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name='end' />
</workflow-app>

提交Oozie作业的相应的作业属性文件如下:

oozie.wf.application.path=hdfs://localhost:8020/user/kamrul/workflows/script#Execute is expected to be in the Workflow directory.
#Shell Script to run
EXEC=script.sh
#CPP executable. Executable should be binary compatible to the compute node OS.
#EXEC=hello
#Perl script
#EXEC=script.pl
jobTracker=localhost:8021
nameNode=hdfs://localhost:8020
queueName=default

How to run any java program bundles in a jar.

<workflow-app xmlns='uri:oozie:workflow:0.3' name='shell-wf'>
    <start to='shell1' />
    <action name='shell1'>
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                  <name>mapred.job.queue.name</name>
                  <value>${queueName}</value>
                </property>
            </configuration>
            <exec>java</exec>
            <argument>-classpath</argument>
            <argument>./${EXEC}:$CLASSPATH</argument>
            <argument>Hello</argument>
            <file>${EXEC}#${EXEC}</file> <!--Copy the jar to compute node current working directory -->
        </shell>
        <ok to="end" />
        <error to="fail" />
    </action>
    <kill name="fail">
        <message>Script failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name='end' />
</workflow-app>

The corresponding job properties file used to submit Oozie job could be as follows:

提交Oozie作业的相应的作业属性文件如下:

oozie.wf.application.path=hdfs://localhost:8020/user/kamrul/workflows/script#Hello.jar file is expected to be in the Workflow directory.
EXEC=Hello.jar
jobTracker=localhost:8021
nameNode=hdfs://localhost:8020
queueName=default

Shell Action Limitations

Although Shell action can execute any shell command, there are some limitations.

  1. No interactive command is supported.
  2. Command can't be executed as different user using sudo.
  3. User has to explicitly upload the required 3rd party packages (such as jar, so lib, executable etc). Oozie provides a way using and tag through Hadoop's Distributed Cache to upload.
  4. Since Oozie will execute the shell command into a Hadoop compute node, the default installation of utility in the compute node might not be fixed. However, the most common unix utilities are usually installed on all compute nodes. It is important to note that Oozie could only support the commands that are installed into the compute nodes or that are uploaded through Distributed Cache.

Shell Action Schema Version 0.2

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:shell="uri:oozie:shell-action:0.2" elementFormDefault="qualified"
           targetNamespace="uri:oozie:shell-action:0.2">    <xs:element name="shell" type="shell:ACTION"/>
    <xs:complexType name="ACTION">
      <xs:sequence>
            <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/>
            <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/>
            <xs:element name="prepare" type="shell:PREPARE" minOccurs="0" maxOccurs="1"/>
            <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="configuration" type="shell:CONFIGURATION" minOccurs="0" maxOccurs="1"/>
            <xs:element name="exec" type="xs:string" minOccurs="1" maxOccurs="1"/>
            <xs:element name="argument" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="env-var" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="capture-output" type="shell:FLAG" minOccurs="0" maxOccurs="1"/>
        </xs:sequence>
    </xs:complexType>
    <xs:complexType name="FLAG"/>
    <xs:complexType name="CONFIGURATION">
        <xs:sequence>
            <xs:element name="property" minOccurs="1" maxOccurs="unbounded">
                <xs:complexType>
                    <xs:sequence>
                        <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/>
                        <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/>
                        <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
    <xs:complexType name="PREPARE">
        <xs:sequence>
            <xs:element name="delete" type="shell:DELETE" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="mkdir" type="shell:MKDIR" minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
    </xs:complexType>
    <xs:complexType name="DELETE">
        <xs:attribute name="path" type="xs:string" use="required"/>
    </xs:complexType>
    <xs:complexType name="MKDIR">
        <xs:attribute name="path" type="xs:string" use="required"/>
    </xs:complexType>
</xs:schema>

Shell Action Schema Version 0.1

<?xml version="1.0" encoding="UTF-8"?>
<!--
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  "License"); you may not use this file except in compliance
  with the License.  You may obtain a copy of the License at       http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:shell="uri:oozie:shell-action:0.1" elementFormDefault="qualified"
           targetNamespace="uri:oozie:shell-action:0.1">
    <xs:element name="shell" type="shell:ACTION"/>
    <xs:complexType name="ACTION">
      <xs:sequence>
            <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/>
            <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/>
            <xs:element name="prepare" type="shell:PREPARE" minOccurs="0" maxOccurs="1"/>
            <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="1"/>
            <xs:element name="configuration" type="shell:CONFIGURATION" minOccurs="0" maxOccurs="1"/>
             <xs:element name="exec" type="xs:string" minOccurs="1" maxOccurs="1"/>
            <xs:element name="argument" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="env-var" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="capture-output" type="shell:FLAG" minOccurs="0" maxOccurs="1"/>
        </xs:sequence>
    </xs:complexType>
    <xs:complexType name="FLAG"/>
    <xs:complexType name="CONFIGURATION">
        <xs:sequence>
            <xs:element name="property" minOccurs="1" maxOccurs="unbounded">
                <xs:complexType>
                    <xs:sequence>
                        <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/>
                        <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/>
                        <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
    <xs:complexType name="PREPARE">
        <xs:sequence>
            <xs:element name="delete" type="shell:DELETE" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="mkdir" type="shell:MKDIR" minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
    </xs:complexType>
    <xs:complexType name="DELETE">
        <xs:attribute name="path" type="xs:string" use="required"/>
    </xs:complexType>
    <xs:complexType name="MKDIR">
        <xs:attribute name="path" type="xs:string" use="required"/>
    </xs:complexType>
</xs:schema>